Adversarial examples
Adversarial examples are small perturbation to an example that is negligible to humans but changes the decision of a computer system. It is first discovered in object recognition (Szegedy et al. 2014)Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian J., and Fergus, Rob. Intriguing properties of neural networks. ICLR, abs/1312.6199, 2014b. URL http: //arxiv.org/abs/1312.6199. but later found in natural language systems as well (Jia and Liang, 2017)Jia, Robin, and Percy Liang. "Adversarial Examples for Evaluating Reading Comprehension Systems." arXiv preprint arXiv:1707.07328 (2017).. This phenomenon is broadly popularized via news about autonomous cars misinterpreting stop signs as speed limit signs, state-of-the-art computer vision systems misinterpreting cats as desktop computers, mistaking face for non-face, gibberish patterns for faces, and one face for another. The phenomenon reveals a fundamental flaw in a big class of classifiers (Goodfellow et al. 2014)Goodfellow, Ian J., Jonathon Shlens, and Christian Szegedy. "Explaining and harnessing adversarial examples." arXiv preprint arXiv:1412.6572 (2014).. TODO: https://pdfs.semanticscholar.org/7330/0838d524d062e8341b242765fb6efaf48f43.pdf https://www.cs.uoregon.edu/Reports/AREA-201406-Torkamani.pdf https://arxiv.org/pdf/1207.0245.pdf Subspaces of transferable adversarial examples: Tramèr et al. (2017)Tramèr, Florian, et al. "The Space of Transferable Adversarial Examples." arXiv preprint arXiv:1704.03453 (2017). Universal adversarial perturbation: https://arxiv.org/pdf/1610.08401.pdf History From Goodfellow (2017): # “Adversarial Classification” Dalvi et al 2004: fool spam filter # “Evasion Attacks Against Machine Learning at Test Time” # Biggio 2013: fool neural nets # Szegedy et al 2013: fool ImageNet classifiers imperceptibly # Goodfellow et al 2014: cheap, closed form attack Explanation * Linearity: Goodfellow et al. (2014) * Data complexity of robust generalization (with no prior at all? what about robust generalization with the right prior?): Schmidt et al. (2018)Schmidt, L., Talwar, K., Santurkar, S., Tsipras, D., & Madry, A. (2018). Adversarially robust generalization requires more data. Advances in Neural Information Processing Systems, 2018-''Decem''(NeurIPS), 5014–5026. * "Identifying a robust classifier from limited training data is information theoretically possible but computationally intractable" (at least for a family of models called "statistical query"): Bubeck et al. (2018)Bubeck, S., Price, E., & Razenshteyn, I. (2018). Adversarial examples from computational constraints, 1–19. Retrieved from http://arxiv.org/abs/1805.10204 * "high dimensional geometry of data manifold" (but hey, people can do it...): Gilmer et al. (2018)Gilmer, J., Metz, L., Faghri, F., Schoenholz, S. S., Raghu, M., Wattenberg, M., & Goodfellow, I. (2018). The Relationship Between High-Dimensional Geometry and Adversarial Examples. Retrieved from http://arxiv.org/abs/1801.02774 * inevitable consequence of "concentration of measure" in metric measure space (but does our problem has it?): Mahloujifar et al. (2019)Mahloujifar, S., Diochnos, D. I., & Mahmoody, M. (2019). The Curse of Concentration in Robust Learning: Evasion and Poisoning Attacks from Concentration of Measure. Proceedings of the AAAI Conference on Artificial Intelligence, 33, 4536–4543. https://doi.org/10.1609/aaai.v33i01.33014536 * non-robust features (of the input) that are useful for normal classification but not for robust classification: Ilyas et al. (2019)Ilyas, A., Santurkar, S., Tsipras, D., Engstrom, L., Tran, B., & Madry, A. (2019). Adversarial Examples Are Not Bugs, They Are Features. Retrieved from http://arxiv.org/abs/1905.02175 ** "We define a feature to be a function mapping from the input space X to the real numbers, ... Note that this formal definition also captures what we abstractly think of as features (e.g., we can construct an f that captures how “furry” an image is)" Some claim that adversarial examples are inevitable (hey, humans seem to be robust against them?): * Alhussein Fawzi, Hamza Fawzi, and Omar Fawzi. Adversarial vulnerability for any classifier, 2018. URL https://arxiv.org/pdf/arXiv:1802.08686.pdf. * Justin Gilmer, Luke Metz, Fartash Faghri, Sam Schoenholz, Maithra Raghu, Martin Wat- tenberg, and Ian Goodfellow. Adversarial spheres. In International Conference on Learning Representations Workshop, 2018. URL https://arxiv.org/pdf/1801.02774.pdf. Adversarial examples in natural language processing Tasks Classifying text in to categories (e.g. Sports, Business), reviews into good/bad (Soll et al. 2019) Soll, M., Hinz, T., Magg, S., & Wermter, S. (2019). Evaluating Defensive Distillation for Defending Text Processing Neural Networks Against Adversarial Examples. International Conference on Artificial Neural Networks (ICANN), 685–696. https://doi.org/10.1007/978-3-030-30508-6_54 Attacks From (Soll et al. 2019) : "algorithm by Samanta and Mehta 22, where the candidate pool P, from which possible words for insertion and re- placement are drawn, was created from the following sources: * Synonyms gathered from the WordNet dataset 5, * Typos from a dataset 16 to ensure that the typos inserted are not recognized as artificial since they occur in normal texts written by humans, and * Keywords specific for one input class which were found by looking at all training sentences and extracting words only found in one class." Defenses Distillation is shown to be ineffective (again) (Soll et al. 2019) TODO: Jia and Jiang (2017)Jia, R., Liang, P.: Adversarial examples for evaluating reading comprehension sys- tems. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. pp. 2021–2031 (2017). DOI: 10.18653/v1/D17-1215: data augmentation not effective? Evaluation CIFAR-10 TODO: Qin et al. (2019)Qin, C., Martens, J., Gowal, S., Krishnan, D., Krishnamurthy, Dvijotham, … Kohli, P. (2019). Adversarial Robustness through Local Linearization, (NeurIPS), 1–17. Retrieved from http://arxiv.org/abs/1907.02610 Failed defenses * Distillation: Papernot et al. (2016) (shown to perform gradient obsfucation and was broken in Athalye et al.Athalye, A., Carlini, N., & Wagner, D. (2018). Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. 35th International Conference on Machine Learning, ICML 2018, 1, 436–448.) * Ensembling: https://arxiv.org/abs/1706.04701 Software * Cleverhans References Category:Machine learning